จากคำชี้แจงสู่การเลียนแบบ: กลไกของการเรียนรู้ในบริบท

ในโมดูลนี้ เราจะเปลี่ยนจากแนวทางดั้งเดิมของการปรับแต่งแบบใช้น้ำหนัก (weight-based fine-tuning) สู่โลกที่มีพลวัตของ การเรียนรู้ในบริบท (ICL)เราสำรวจว่าโมเดลภาษาขนาดใหญ่ (LLMs) สามารถทำให้สำเร็จงานได้ โดยไม่ต้องเปลี่ยนโครงสร้างภายใน แต่อาศัยโครงสร้างของคำสั่ง (prompt) เพื่อเดินทางผ่านพื้นที่ลึกลับที่ซับซ้อน

1. จากการบอกให้ ไปสู่การแสดงให้เห็น

ขณะที่คำชี้แจงให้ทิศทางทั่วไป แต่การเลียนแบบผ่านคู่ข้อมูล (x, y) จะทำหน้าที่เป็นแนวทางที่ไม่ใช้พารามิเตอร์ ตัวอย่างเหล่านี้กลายเป็นจุดยึดทางสถิติ ที่ทำให้การแจกแจงความน่าจะเป็นของโมเดลแคบลง ลดความคลุมเครือที่มีอยู่ในคำชี้แจงภาษาธรรมชาติที่ยังไม่ถูกปรับปรุง

2. กลไกของความสนใจ (Attention)

ICL อาศัยกลไกความสนใจของโมเดล Transformer เพื่อทำการ "การนำเข้างาน (task induction)" โดยการระบุรูปแบบที่เป็นระบบในลำดับที่คุณให้มา โมเดลจะหาตำแหน่งเฉพาะของการแปลงเชิงฟังก์ชันในพื้นที่มิติสูง ทำให้มันสามารถเลียนแบบสไตล์และโครงสร้างได้อย่างแม่นยำ

แม่แบบรูปแบบ ICL

[บริบท/คำชี้แจง]: "แปลคำศัพท์ทางเทคนิคต่อไปนี้เป็นภาษาที่เข้าใจง่ายโดยไม่ใช้ศัพท์เฉพาะเจาะจง" [ตัวอย่าง 1]: "เข้า: ช่องลึก (Latent Space) | ออก: แผนที่ทางคณิตศาสตร์ที่ซ่อนอยู่ ซึ่งโมเดลใช้เก็บแนวคิดต่างๆ" [ตัวอย่าง 2]: "เข้า: ทรานสฟอร์เมอร์ (Transformer) | ออก: สถาปัตยกรรมปัญญาประดิษฐ์ที่พิจารณาความสำคัญของคำต่างๆ ในประโยค" [ข้อมูลทดสอบ]: "เข้า: การเรียนรู้ในบริบท | ออก: "

Type a message... (Disabled in Demo Mode)

Mechanics Check

Mechanically speaking, what is the primary role of providing $(x, y)$ pairs in a prompt?

To retrain the model's neural weights for a specific task.

To act as anchors that resolve ambiguity and narrow the prediction distribution.

To increase the model's processing speed by reducing sequence length.

To bypass the attention mechanism entirely.

Challenge: From Instruction to Imitation

Imitation Mastery

Vague Instruction: "Rewrite these emails to be professional."

Goal: Provide a three-exemplar few-shot prompt that teaches the model a specific "Concise Executive" style, rather than just a generic professional tone.

Analysis

Why is providing specific examples more effective than simply adding the adjective "Concise" to the instruction?

Solution:
Adjectives like "Concise" are subjective and have broad probability distributions; examples provide a concrete structural template that the attention mechanism can emulate with mathematical precision.